Using deep transfer learning to detect scoliosis and spondylolisthesis from x-ray images

Recent years have witnessed wider prevalence of vertebral column pathologies due to lifestyle changes, sedentary behaviors, or injuries. Spondylolisthesis and scoliosis are two of the most common ailments with an incidence of 5% and 3% in the United States population, respectively. Both of these abnormalities can affect children at a young age and, if left untreated, can progress into severe pain. Moreover, severe scoliosis can even lead to lung and heart problems. Thus, early diagnosis can make it easier to apply remedies/interventions and prevent further disease progression. Current diagnosis methods are based on visual inspection by physicians of radiographs and/or calculation of certain angles (e.g., Cobb angle). Traditional artificial intelligence-based diagnosis systems utilized these parameters to perform automated classification, which enabled fast and easy diagnosis supporting tools. However, they still require the specialists to perform error-prone tedious measurements. To this end, automated measurement tools were proposed based on processing techniques of X-ray images. In this paper, we utilize advances in deep transfer learning to diagnose spondylolisthesis and scoliosis from X-ray images without the need for any measurements. We collected raw data from real X-ray images of 338 subjects (i.e., 188 scoliosis, 79 spondylolisthesis, and 71 healthy). Deep transfer learning models were developed to perform three-class classification as well as pair-wise binary classifications among the three classes. The highest mean accuracy and maximum accuracy for three-class classification was 96.73% and 98.02%, respectively. Regarding pair-wise binary classification, high accuracy values were achieved for most of the models (i.e., > 98%). These results and other performance metrics reflect a robust ability to diagnose the subjects’ vertebral column disorders from standard X-ray images. The current study provides a supporting tool that can reasonably help the physicians make the correct early diagnosis with less effort and errors, and reduce the need for surgical interventions.


Introduction
The spinal column is comprised of 33 small bones called vertebrae, which are classified into five distinct areas; cervical, thoracic, lumbar, sacrum, and coccygeal. It is essential for the human body motion and stability. More importantly, the spinal column provides protection for the spinal cord and nerve roots. The spinal cord is part of the central nervous system (CNS) and is responsible for carrying sense and movement information from and to the brain. Hence, the degeneration of the spine results in a wide range of ailments (e.g., restricted motion, pain, numbness, etc.), and reduces the quality of life in general [1].
Several pathologies can affect the vertebral column. In this paper, we examine two types of degenerative pathologies; Scoliosis and Spondylolisthesis. Scoliosis is a curvature of the thoracic or lumbar spine in the coronal plane (i.e., sideways). It is diagnosed by the specialist using X-ray images of the spine and possibly a Magnetic Resonance Imaging (MRI) to rule out tumors [2]. More specifically, the Cobb angle is measured on the image of the vertebrae column, and a value > 10˚indicates scoliosis [3]. In addition, other signs can indicate scoliosis (e.g., uneven shoulders, waist, hip, or ribcages). Scoliosis is a common spinal disorder with a prevalence of 0.47-5.2% depending on the country [2]. For example, it is estimated that 6 to 9 million people in the United States suffer from some degree of scoliosis [4]. Spondylolisthesis is a condition caused by an injured vertebral shipping or slipping forward on the vertebrae directly below it [1]. This is typically categorized into different grades depending on the degree of slippage (e.g., low grade vs high grade) [5]. Spondylolisthesis exhibits a prevalence in adult population of 6% [6], and can cause difficulties in standing and walking, numbness, or weakness in one or both legs [5].
The process of diagnosing the spinal column disorders starts with a physical examination. In this step, the doctor investigates the patient's medical history, participation in sports/physical activity, and involvement in accidents. Moreover, the back and spine need to be carefully examined for signs of abnormal shape, restricted range of motion, or muscle weakness/spasm. In addition, the examination involves performing posture and gait analysis [5]. Once an initial diagnosis is made, the next step would be radiological examinations. X-ray images of the back provides more information about the structure of the spine and the existence of fractures, infections, or other abnormalities. Whereas, computed tomography (CT) images are useful for inspecting the spinal canal. On the other hand, the magnetic resonance imaging (MRI) technique show the spinal cord and nerve, roots and their surroundings [4,5]. These imaging tests enable the objective determination of biomechanical features (e.g., Cobb angle) and represent a gold standard for the diagnosis of vertebral column ailments [1]. These images are normally taken laterally or from anterior/posterior view of the patient's back. However, the measurement accuracy of the biomechanical angles is subjective and depends on the experience of the specialist (i.e., radiologist or orthopediatrician). Moreover, high case workload, stress, urgency, or lack of qualified specialists can lead to errors and incorrect diagnosis.
The medical literature in relation to the health of the vertebral column has focused primarily on extracting biomechanical parameters that objectively determine and quantify the disease state of the spine. To this end, scoliosis and its severity can be diagnosed using the Cobb angle, which was described by John Cobb in 1948 and represent the gold standard. However, it has some shortcomings relating to measurement difficulties and in relation to 3D deformities [7]. Similarly, spondylolisthesis can be determined from several parameters that can be measured directly from radiographs. Some of these include; sacral slope, lumbar lordosis, and pelvic incidence. Statistical analysis results in the literature showed significant differences of these parameters across different disease states and normal subjects [1].
The research landscape using machine learning (ML) and artificial intelligence (AI) followed a similar path to that of the medical literature by designing algorithms that can automatically extract the aforementioned biomechanical markers of disease from medical images [8][9][10][11], which can be utilized by the specialists for diagnosis. Furthermore, these parameters can be utilized as features for AI-based diagnosis by classifying images into healthy and different disease classes [1,12,13]. However, the accuracy of such methods is either low [14][15][16] or highly dependent on the accuracy of measurement of the biomechanical parameters [1,12,17]. In contrast, the work in this paper does not require any explicit measurements of any parameters. It relies on the feature extraction capabilities of deep learning convolutional neural networks to automatically determine the disease class of the input X-ray images. Thus, it eliminates compounded errors and the need for multiple diagnosis steps and complex image processing algorithms.
Recently, deep learning AI architectures has enabled more innovation in disease diagnosis from medical images. For example, Mahajan et al. [18,19] and Raina et al. [20] employed single shot multiBox detector (SSD) in a combination with deep transfer learning models to detect COVID-19 infections from chest x-ray (CXR) images, and achieved high levels of performance in terms of precision (i.e., 93.01%). In the context of scoliosis, Yang et al. [16] used unclothed back images, after bounding the region of interest (i.e., the subject's back) using faster recurrent convolutional neural network (Faster-RCNN), as input to the Resnet architecture. They reported an average accuracy of 80% for scoliosis screening but the performance was very low using an external validation dataset (i.e., 55.5%-87%). In a similar study, Kokabu et al. [21] used a combination of 3D depth sensors and a custom-made convolutional neural networks (CNN) to measure the Cobb angle from nude back images. Although their study employed additional hardware, the results show very low specificity (42%-78%). More importantly, the author should have reported the absolute percentage error as the dataset contain a varying range of Cobb angles (0˚-64˚) and the absolute error does not fully reflect the performance of the model (e.g., an error of 5 of 10 is different from an error of 5 of 50). The approach proposed in this paper does not require extra hardware and achieves superior performance.
The Cobb angle is typically measured using X-ray images. Hence, Tan et al. [22] used a combination of image processing techniques and U-net deep learning architecture to determine the location of vertebrae of interest and subsequently measure the Cobb angle. A wide range of approaches for Cobb angle measurement and scoliosis detection by Karpiel et al. [8]. Classification techniques were also used to distinguish various scoliosis-related classes. Wang et al. [15] designed a deep learning model to differentiate between progressive (P) and nonprogressive (NP) classes at first clinic visit. Vergari et al. [23] combined CNN with discriminate analysis to determine the type of scoliosis treatment appearing the X-ray image (i.e., brace, spinal implant, or neither). Although their study did not aim to diagnose scoliosis, the authors claim that their work will facilitate the processing of large databases for such research purposes. Colombo et al. [14] used video raster stereography (RST) as an input to supervised and unsupervised machine learning models, and extracted representative features of scoliosis in comparison to healthy subjects. They reported an accuracy range of 84.9%-87.5%. These traditional approaches still rely on explicit feature extraction and image precessing techniques.
A similar path was taken in the literature for spondylolisthesis identification. Neto et al. [24] used non-deep machine learning techniques (e.g., Support Vector Machine) to differentiate healthy subjects from those suffering from spondylolisthesis/Disk herniation. They used Xray images as an input and extracted six biomechanical attributes that are markers of the disease states and form the features for classification. They achieved an 85.9% maximum accuracy. This methodology of processing X-ray images to extract disease features and using various classical (i.e., non-deep) machine learning algorithms (e.g., multilayer perceptron) and processing techniques (e.g., clustering) was taken by several related works [1,12,25,26]. However, such explicit extraction of measurements and features may complicate usability and can be error prone [27]. Liao et al. [28] proposed automatic spondylolisthesis measurement using CT images as input. The idea of such approaches is that computerized methods can achieve better accuracy in detecting vertebra edges, features, keypoints, or segmental motion angles [27,29] in a manner that spondylolisthesis can be accurately determined/graded. This literature suffers from the same aforementioned shortcomings it terms of accuracy, explicit processing, or multiple stages of diagnosis.
The contributions of this paper are as follows: • Develop a reliable artificial intelligence system for the diagnosis of scoliosis and spondylolisthesis based on radiographic X-ray images of the vertebral column. Such a system can provide support for clinical diagnosis decisions, and reduce errors and overhead.
• We collect X-ray images of subjects suffering from scoliosis and spondylolisthesis, as well as healthy ones, as determined by the specialists in the hospital. This dataset will expand and enrich any comparable publicly available datasets, enable the development of automated machine learning and AI algorithms for the detection of vertebrae ailments, and can be used for training and educating medical students, residents, and specialists.
• Investigate several deep learning convolutional neural network models for the classification of scoliosis, spondylolisthesis, and normal X-ray images using transfer learning.
• We evaluate the performance of the deep learning models for three-class (scoliosis vs spondylolisthesis vs normal) and pair-wise classification problems (scoliosis vs spondylolisthesis, scoliosis vs normal, and spondylolisthesis vs normal). The cost of each model in terms of training and testing times were also evaluated. The rest of this paper is organized as follows. In the materials and methods section, we present the data collection procedure, subjects, deep learning models, performance evaluation setup, and performance metrics. The results section provides the results in detail and discussion of the various observations. The conclusion section presents the future works and concludes the work in this paper.

Materials and methods
The work in this paper exploits the abilities of generically pre-trained convolutional neural network models to automatically classify X-ray images into three possible spine-related conditions; scoliosis, spondylolisthesis, or normal(i.e., healthy). The approach achieves high performance metrics while not requiring manual or automatic measurements nor any feature extraction as this is inherently done by the deep learning architecture. In addition, no elaborate image processing or modeling are required. Fig 1 shows the general steps for customizing the pre-trained models for classification of the X-ray images into normal (i.e., healthy), scoliosis, or spondylolisthesis.

Subjects and data collection
The current study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board (IRB) at King Abdullah University Hospital Written informed consent was obtained from all subjects involved in the study (or their parents in case of minors). The diagnosis was determined by two orthopedic specialists at the KAUH.
The dataset included 338 subjects (240 females, 98 males) with an age range from 9 months to 79 years and mean ± SD of 24.9 ± 18.58 years. The number of subjects with normal X-ray images was 71 (40 females, 31 males) with an age range of 9 months to 56 years and mean ± SD of 19.41 ± 11.19. The number of subjects diagnosed with spondylolisthesis was 79 (49 females, 30 males) with an age range of 15-79 years and mean ± SD of 53.59 ± 14.02. The number of subjects diagnosed with scoliosis was 188 (151 females, 37 males) with an age range of 5-35 years and mean ± SD of 14.73 ± 3.36.

Deep learning models
Typically, the main input to the diagnosis of vertebral column diseases is medical images (i.e., X-ray, CT, or MRI). Hence, convolutional neural networks (CNNs) were used to classify the input into the possible disease state. CNNs are a type of feed forward neural networks with a deep architecture and form the basis for a major part of the deep learning models (DNNs) in the literature. Other types include Recurrent Neural Network (RNN) with variations (e.g., Long Short Term Memory (LSTM), and transformers), and Generative adversarial networks (GANs). CNNs have been found to be useful for image processing and classification as they are able to extract patterns and features in images regardless of scaling, mirroring, rotation, or translation.
The CNN is generally comprised of several types of layers and takes a tensor of order 3 as input (i.e., an image with N rows, M columns, and 3 (RGG) color channels). Convolution layers scan the image looking for correlated regions (e.g., vertebra). The input image is divided into small subparts called receptive fields, which in turn are grouped into feature maps. Each feature map has a corresponding weight matrix (i.e., kernel), which is learned/updated during training. Rectified linear unit (ReLU) usually follows the convolution layer and introduces nonlinearity into the CNN. Pooling layers reduce the dimensionality of the feature maps feeding into subsequent layers by considering subparts of the feature map and taking the maximum (i.e., max-pooling), average (i.e., average-pooling), or other statistical measure. Fully connected layers are similar to multilayer perceptron (MLP) networks and ensure that all elements in the previous layer contribute to the output or following layer. Dropout layers remove certain elements of the network in order to prevent overfitting and improve model generalization. The mathematical foundations, benefits, alternatives, and tradeoffs are well-established in the literature and beyond the scope of this work [30]. Transfer learning utilizes pre-trained deep learning models, which were developed using millions of images from the ImageNet [31] and other databases (e.g., Places365 [32]). The models are able to classify images into hundreds of categories. However, they can be tailored and retrained to preform new tasks using transfer learning. For this to work, the final layer need to be changed to match the number of output classes in the new task. Depending on the model, the final layer could be a FullyConnectedLayer or a Convolution2DLayer, and needs to be replaced accordingly with a number of filters equal to the number of output classes. As for the input, each model requires images to be of a certain dimension (e.g., [244 244 3]), which requires resizing. In addition, grayscale images (i.e., 2D) need to be transformed to rgb (i.e., 3D) images.
The following is a short description of the 14 convolutional neural network models used in this paper: • SqueezeNet is 18 layers deep with an image input size of [227 227 3]. It was designed with the premise that smaller deep neural networks can offer comparable accuracy levels to large architectures but with the advantages of lesser inter-process communication, faster deployment on end-user machines, and more suitability to resource-limited environments. The model was pre-trained using the ImageNet database [31] to classify images into 1000 possible object classes (e.g., screwdriver, car, etc.). In this paper, SqueezeNet v1.1 was used, which provides the same accuracy as SqueezeNet v1.0 but with less computational overhead [33].
• GoogLeNet is 22 layers deep with an image input size of [224 224 3]. It is part of the family of Inception deep learning models and it is marked by the improved utilization of the computing resources, which allowed for increasing the depth and width of the network without any additional computational cost [34]. The model is available pre-trained on images from ImageNet or Places365 [32]. The former was used in this work.
• Inception-v3 is the third version of the Inception models, which improves on the previous two by having more parameters (e.g., utilizing three different filter sizes in the parallel convolution layers). The model is 48 layers deep with an image input size of [299 299 3] pretrained on images form ImageNet [35].
• DenseNet-201, as the name suggests, is 201 layers deep with an image input size of [224 224 3]. The model represents a big jump in the number of layers compared to others. This was made possible by shortening the connections between layers close to the input/output. Connections between layers are made such that each layer feeds into later layers, which improves feature propagation/reuse and drastically reduces the number of parameters [36].
• MobileNets is 53 layers deep with an image input size of [224 224 3]. It is a network designed for mobile environments. Thus, the model is required to be efficient and small by reducing the memory requirements. This is achieved by inverted residual bottleneck layers that require computation that can be scheduled with minimum working set (i.e., number of tensors concurrently stored in memory) [37].
• ResNet-101, ResNet-50, and ResNet-18. The ResNet family of models with the corresponding layer depth require the same image input size of [224 224 3] and pre-trained on the Ima-geNet database. The architecture is characterized by using network-in-network scheme that employ learning residual functions with reference to layer inputs [38]. It is a winner of the ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC2015).
• The Xception model is 71 layers deep with an image input size of [299 299 3]. It is trained on images from the ImageNet database. The architecture improves on the Inception network by replacing the standard inception modules with depthwise separable convolutions [39].
• The Inception-ResNet-v2 model is 164 layers deep with an image input size of [299 299 3]. It is trained on images from the ImageNet database. The architecture is hybrid of the Inception model and residual connections, which results in faster training [40].
• ShuffleNet is another model designed for resource limited deployment environments. It is based on pointwise group convolutions and channel shuffling, to drastically improve the computational overhead without scarifying the classification accuracy [41]. The model is pre-trained using the ImageNet database and requires an image input size of [224 224 3].
• NAsnetMobile is the mobile version of the Neural Architecture Search Network (Nasnet) model. The main idea of this type of models is to learn the network architecture during training on the specific dataset using reinforcement learning search. Converging to the best model is reduced to finding the optimal cell structure (i.e., convolutional layer), which is duplicated to other convolutional networks but with different weights [42]. The model is pre-trained on the ImageNet database and requires an image input size of [224 224 3].
• DarkNet-53 is pre-trained on the ImageNet database and requires an input image of size [256 256 3]. The model is 53 layers deep and was designed with speed and object detection as primary objectives [43]. It improves on the previous version, DarkNet-19 by using more layers and employing residual connections [44].
• EfficientNet-b0 is the baseline EfficientNet architecture, which provides scaled models up to EfficientNet-b7. The architecture design is based on the idea of compound scaling, which uniformly scale the network depth, width, and input resolution by fixed scaling coefficients [45]. The model is pre-trained using the ImageNet database and requires an input image of size [224 224 3].

Performance evaluation setup
The deep learning models were modified, trained, and evaluated using MATLAB R2021a software running on an HP OMEN 30L desktop GT13 with 64 GB RAM, NVIDIA 1 GeForce RTX™ 3080 GPU, Intel 1 Core™ i7-10700K CPU @ 3.80GHz, and 1TB SSD.
To prevent the models from overfitting specific image details, pixel translation (i.e., shifting the image) by 30 pixels vertically and horizontally was performed on the X-ray images used for training. Moreover, training images were randomly flipped along the x-axis (i.e., reflection), and rescaled from the range [0.9,1.1]. The model training options were set such that the minimum batch size was 10 (except for NASNet-Mobile, which had the size set to 2 due to slowness), the max epochs was set to 6, and initial learning rate was 0.003. Moreover, the stochastic gradient descent with momentum (SGDM) optimizer was used for training due to popularity and fast convergence [46]. The holdout method with a split of 70% training and 30% testing was used. Furthermore, to counter any bias in the data split, the experiments were repeated 40 times, and the minimum, maximum, average, and standard deviation (SD) were reported. In addition, samples of the training and validation curves were reported for the highest preforming model for each classification problem.

Performance metrics
The performance of the models was evaluated using five metrics: precision, recall, specificity, F1 score, and accuracy. Precision is the ratio of true positives to all images identified as positive (i.e., including false positives). Recall (i.e., sensitivity) is the ratio of true positives to all relevant elements (i.e., the actual positives). Specificity, or the true negative rate, measures the ability to identify negative elements. The F1 score is the harmonic mean of the recall and precision and expresses the accuracy of classification in unbalanced datasets. The accuracy is defined as the ratio of the true positives for all classes to the number of instances (i.e., total images in the testing set). The five measures are defined as follows: Where TP (true positives) is the number of correctly classified images (i.e., for each one of the classes), FP (false positives) is the number of wrongly classified images as another class, and FN (false negatives) is the number of images missed by the classifier.

Results and discussion
The purpose of the experiments was to evaluate the effectiveness of the pre-trained models, after customization and training, in identifying the correct disease diagnosis of the X-ray image. Moreover, since deep learning algorithms incur high overhead, the time of the training and testing was recorded too. Depending on the classification problem (three classes or two, and type of disease), the number of testing images ranged from 45 to 101. Tables 1 and 2 show the performance evaluation metrics for classifying X-ray images into normal, scoliosis, or spondylolisthesis. The DensNet-201 achieved the highest accuracy value over the three statistical measures with a mean of 96.34%, maximum 99.01%, and minimum 94.06%. On the other hand, the baseline EfficientNet model performed the worst with an average accuracy of 87.92%, although NASNet-Mobile scored the lowest minimum accuracy of 78.22%. The later displayed the highest variation in accuracy values based on the standard deviation of 4.8%. The other performance metrics display a consistent and homogenous ability to identify negative as well as positive cases with a similar performance pattern to the accuracy results (i.e., DenseNet-201 achieving the best results). The F1-score is of special importance as the dataset is imbalanced due to the scoliosis class having more images in comparison to the other two. Thus, the accuracy values maybe misleading, but this is not the case as the F1-score reflects a similar performance over all classes.   Tables 3 and 4 show the performance evaluation metrics for classifying X-ray images into normal or scoliosis. The Resnet-101 and ResNet-18 achieved the highest mean accuracy (i.e., 97.66%) although the ResNet-18 model is smaller and faster. Since this is an easier classification problem that the three-class one, all models achieved high accuracy values with less standard deviation over multiple runs. However, the NASNet-Mobile model had a 4.55% SD. Similarly, the F1 score and other metrics display consistent good performance over all classes.   Tables 5 and 6 show the performance evaluation metrics for classifying X-ray images into normal or spondylolisthesis. Most models achieved very high mean accuracy (> 96%) with ResNet-101 achieving the highest value of 99.33%. Several models achieved a maximum accuracy of 100%, however the NASNet-Mobile model achieved the lowest accuracy with high fluctuation over several runs (5.18% SD) along with the DarkNet-53 model (4.94% SD).  Tables 7 and 8 show the performance valuation metrics for classifying X-ray images into scoliosis vs spondylolisthesis. The performance of all models drops, although with varying degrees, as they try to differentiate between two disease states. Nonetheless, Dense-Net-101 achieved a high mean accuracy of 97%. One notable difference from the other classification results is that some models achieved a low minimum accuracy (Inception-ResNet-v2: 78.75% and 4.97% SD, NASNet-Mobile: 73.75% and 6.12% SD). In addition, almost all models displayed greater standard deviation. This indicates the sensitivity of the results to the type of   Since deep learning models are computation intensive, we have compared the time required to train and test each model. Table 9 shows the mean training and validation times for each of the 14 deep learning models for the four types of classification problems in this work. As the table shows, the smaller the dataset, the lesser the time required by all models. SqueezeNet required the least time and it is very fast in comparison to all others. However, the time required by the highest accuracy models (DenseNet-201, ResNet-18, and ResNet-101) is somewhat reasonable. On the other hand, NasNet-Mobile is extremely slow and achieved the lowest accuracies throughout.  Table 10 shows a comparison to the related work in the literature in terms of performance. Although the related literature produced high accuracy values, these approaches [1,12,17] require extensive and error-prone measurement of the biomechanical parameters that indicated the specific disease case, which is not required by our approach. To our knowledge, no other study has included deep learning in the classification of scoliosis vs spondylolisthesis vs normal X-ray images. Colombo et al. [14] addressed the problem of healthy vs scoliosis classification and achieved a low accuracy of 85% at their best. Similarly, Wang et al. [15] could not achieve high accuracy in scoliosis progression detection, and Yang et al. achieved an average accuracy of 80% for distinguishing scoliosis severity based on the Cobb angle (< 10˚,10˚-19˚,20˚-44˚, or � 45�). On the other hand, the work in this paper achieves superior accuracy with less input processing/measurements although there is no exactly comparable literature. Nonetheless, the work in this paper can be further improved by: • Including images of more vertebral column diseases (e.g., disc degeneration, spondylitis, osteoporosis, etc.) in a global image data store similar to ImageNet.
• Development of algorithms and using transfer learning to pinpoint faulty vertebrae or the exact location of the spine anomaly.
• Multistage classification. First images are classified into the corresponding disease state followed by localization or severity grading. • Continual learning by the development and deployment of mobile applications to aid physicians, collect data, and refinement of the AI models.

Conclusion
Artificial intelligence-aided diagnosis systems are being proposed and deployed into many medical areas. These systems have many advantages such as aiding undermanned remote

PLOS ONE
Detection of scoliosis and spondylolisthesis from x-ray images areas, reducing human errors, and optimizing costs. In this paper, it has been shown that transfer deep learning using locally collected X-ray images is able to achieve high performance in terms of correctly identifying normal subjects from those suffering from scoliosis or spondylolisthesis. The highest mean accuracy values ranged from 96.34% for three-class classification to > 97% for the other classification problems. Even though deep learning incurs high overhead, the results show that training and validation can be performed in a reasonably low time using off the shelf hardware resources.    Transfer deep learning can be used to perform spondylolisthesis and scoliosis screening in order to improve the selection of patients who would require further costly CT or MRI imaging. Moreover, the work in this paper can be further improved and made robust by larger databases of more images and more diseases. In addition, field deployment will allow practical benefits and continuous improvements.